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Abstract 



This article presents near-optimal guarantees for accurate and robust image recovery from 
under-sampled noisy measurements using total variation minimization, and our results may 
be the first of this kind. In particular, we show that from 0(s log(A^)) nonadaptive linear 
measurements, an image can be reconstructed to within the best s-term approximation of its 
gradient, up to a logarithmic factor. Along the way, we prove a strengthened Sobolev inequality 
for functions lying in the null space of a suitably incoherent matrix. 

1 Introduction 

Compressed sensing (CS) provides the technology to exploit sparsity when acquiring signals of 
general interest, ahowing for accurate and robust signal acquisition from surprisingly few mea- 
surements. Rather than acquiring an entire signal and then later compressing, CS proposes a 
mechanism to collect measurements in compressed form, skipping the often costly step of complete 
acquisition. The applications are numerous, and range from image and signal processing to remote 
sensing and error correction j20| . 

In compressed sensing one acquires a signal x ^ via m <^ d linear measurements of the 
form = {cj)f,,x) + z^- The vectors 0^ form the rows of the measurement matrix and the 
measurement vector y £ C"^ can thus be viewed in matrix notation as 



where z is the noise vector modeling measurement error. We then ask to recover the signal of 
interest x from the noisy measurements y. Since m <^ d this problem is ill-posed without further 
assumptions. However, signals of interest in applications contain far less information than their 
dimension d would suggest, often in the form of sparsity or compressibility in a given basis. We 
call a vector x s-sparse when 



Compressible vectors are those which are approximated well by sparse vectors. 

In the simplest case, if we know that x is s-sparse and the measurements are free of noise, then 
the inverse problem y = ^x is well-posed if the measurement matrix $ is one-to-one on sparse 
vectors. To recover x £ from y G C™' we solve the optimization problem 



y = ^x + z 





X = argmin ||w||o such that = y. 




w 
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If ^ is one-to-one on s-sparse vectors, then (Lq) recovers exactly any s-sparse signal x = x. The 
optimization problem (Lq) however is in general NP-Hard |37j so we instead consider its relaxation 
to the ^i-norm, 

X = argmin \\w\\i such that \\^w — y\\2 < s, (Li) 

w 

1/2 

where \\w\\i = \ wi\, \\w\\2 = i^iwf) denotes the standard Euclidean norm, and e bounds 
the noise level ||z||2 < £■ The problem (Li) may be cast as a linear program and can thus be solved 
efficiently using modern linear programming methods |17| [2T] . 

If we require that the measurement matrix is not only one-to-one on s-sparse vectors, but 
moreover an approximate isometry on s-sparse vectors, then remarkably (Li) will still recover any 
s-sparse signal exactly. Candes et.al. introduced the restricted isometry property (RIP) and showed 
that this requirement on the measurement matrix $ guarantees robust recovery of compressible 
signals via (Li) fT2]. 



Definition 1. A matrix $ G C™'^'^ is said to have the restricted isometry property of order s and 
level 6 G (0, 1) if 

(1 -(5)||a;||^ < < (l + 5)||a;||| for all s-sparse x e (2) 

The smallest such 5 for which this holds is denoted by 6s and called the restricted isometry constant 
for the matrix 

When 62s < 1, the RIP guarantees that no 2s-sparse vectors reside in the null space of 
When a matrix has a small restricted isometry constant, $ acts as a near- isometry over the subset 
of s-sparse signals. 

Many classes of random matrices can be used to generate matrices having small RIP constants. 
With probability exceeding 1 — e"*^™, a matrix whose entries are i.i.d. appropriately normalized 
Gaussian random variables has a small RIP constant 6s < c when m > c~^slog{d/s). This number 
of measurements is also shown to be necessary for the RIP [30j. More generally, the restricted 
isometry property holds with high probability for any matrix generated by a subgaussian random 
variable |13| [35l B5l . One can also construct matrices with the restricted isometry property using 
fewer random bits. For example, if m > slog^(ci) then the restricted isometry property holds with 
high probability for the random subsampled Fourier matrix Fq E C™^'^, formed by restricting the 
d X d discrete Fourier matrix to a random subset of m rows and re-normalizing |45j . The RIP also 
holds for randomly subsampled bounded orthonormal systems [43^ and randomly-generated 
circulant matrices [42]. 

Candes, Romberg, and Tao showed that when the measurement matrix $ satisfies the RIP with 
sufficiently small restricted isometry constant, (Li) produces an estimation x to x with error [TT] , 

\\x-x\\2<c(^^^^^p^ + e). (3) 



This error rate is optimal on account of classical results about the Gel'fand widths of the ii ball 
due to Kashin [28] and Garnaev-Gluskin |25J. 

Here and throughout, Xg denotes the vector consisting of the largest s coefficients of x in 
magnitude. Similarly, for a set 5", xs denotes the vector (or matrix, appropriately) of x restricted 
to the entries indexed by S. The bound ([3| then says that the recovery error is proportional 
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to the noise level and the norm of the tail of the signal, x — Xs- In particular, when the signal 
is exactly sparse and there is no noise in the measurements, (Li) recovers x exactly. We note 
that for simplicity, we restrict focus to CS decoding via the program (Li), but acknowledge that 
other approaches in compressed sensing such as Compressive Sampling Matching Pursuit [38] and 
Iterative Hard Thresholding [4j yield analogous recovery guarantees. 

Signals of interest are often compressible with respect to bases other than the canonical basis. 
We consider a vector x to be s-sparse with respect to the basis B if 

X = Bz for some s-sparse z, 

and X is compressible with respect to this basis when it is well approximated by a sparse represen- 
tation. In this case one may recover x from CS measurements y = ^x + ^ using the modified ii 
minimization problem 

a; = argmin such that — y||2<e {BLi) 

w 

As before, the recovery error ||a; — x||2 is proportional to the noise level and the norm of the tail of 
the signal if the composite matrix ^ = ^B satisfies the RIP. If S is a fixed orthonormal matrix and 
$ is a random matrix generated by a subgaussian random variable, then ^ = ^B has RIP with 
high probability with m > s\og{d/s) due to the invariance of norm-preservation for subgaussian 
matrices J2j. More generally, following the approach of ^ and applying Proposition 3.2 in [30j, this 
rotation-invariance holds for any $ the restricted isometry property and randomized column signs. 
The rotational-invariant RIP also extends to the classic ^i-analysis problem which solves {BLi) 
when B* is a tight frame [8]. 

1.1 Imaging with CS 

Grayscale digital images do not fill the entire space oi N x N blocks of pixel values, consisting 
primarily of slowly-varying pixel intensities except along edges. In other words, digital images are 
compressible with respect to their discrete gradient. Concretely, we denote an x block of 
pixels by X S C'^^^, and we write Xj^k to denote any particular pixel. The discrete directional 
derivatives of X € C^^^ are defined pixel- wise as 



X, : 
Xy : 



^NxN _^ ^{N~l)xN 
^NxN (pNx{N-l) 



The discrete gradient transform V : C^^^ 
derivatives and in matrix form. 



{Xx)j,k — Xj+l,fc — -^j,k 

(^NxNx2 -g figf^j^gfi terms of the directional 



(4) 
(5) 



' l<j<Af-l, l<k<N-l 

(0, {Xy)j^k), 

{{Xx)j,k,0), 

[ (0,0), 



j = N, l<k<N -I 
k = N, l<j<N-l 
j = k = N 



Finally, the total-variation semi-norm is just the sum of the magnitudes of its discrete gradient. 



\X\ 



TV 



IV [XI 



1- 



(6) 
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We note here that the choice of [{Xx)j^k, {^y)j.k) in the definition of [V[X]]^. ^ leads to the 
anisotropic version of the total variation norm. The isotropic version of the total variation norm 
stems from instead the choice of {Xx)j^k + i{^y)j,k in the definition of the discrete gradient. In the 
isotropic case, becomes the sum of terms 

\{Xx)j,k + i{Xy)j^k\ = {{Xx)\k + {^y)\kY^'^ ■ 

The isotropic and anisotropic induced total variation norms are thus equivalent up to a factor of 
\/2- We emphasize here that our method applies to both anisotropic and isotropic total variation. 
However, we will consider only the anisotropic case for simplicity because the treatment of the 
isotropic case is completely analogous. 

Natural images have small total-variation due to the low-dimensionality of their subset of pixels 
representing edges. As such, searching for the image with smallest total-variation that matches a 
set of measurements, the convex relaxation of searching for the image with fewest edges, is a natural 
choice for image reconstruction. In the context of compressed sensing, the measurements y E C"^ 
from an image X are of the form y = M.{X) + ^, where is a noise term and M. : C^^^ — )• C™' 
is a linear operator defined via its components by 

[M{X)]j = {Mj,X) = trace(MjX*), 

for suitable matrices Mj. Here and throughout, M* denotes the adjoint of the matrix M. Total- 
variation minimization refers to the convex optimization 

X = argmin \\Z\\tv such that \\M{Z) - y\\2 < e. (TV) 
z 

The standard theory of compressed sensing does not apply to total-variation minimization. In fact, 
the gradient transform Z — >• V[.Z] not only fails to be orthonormal, but viewed as an invertible 
operator over mean-zero images, the Frobenius operator norm of its inverse grows linearly with N. 
This poor conditioning would lead to magnification of error even if the usual CS techniques could 
be applied. 

Despite this, total-variation minimization (TV) is widely used in applications and exhibits ac- 
curate image reconstruction empirically (see e.g. lUi IM \M \M \M \M \M \i3 
However, to the authors ' best knowledge there have been no provable guarantees that (TV) recovery 
is robust. 

Images are also compressible with respect to wavelet transforms. For example, in Figure [T] we 
display the image of Boats alongside its (bivariate) Haar wavelet transform. The Haar transform 
(like wavelet transforms more generally) is multi-scale, collecting information not only about local 
differences in pixel intensity, but also differences in average pixel intensity on all dyadic scales. 
Therefore, the level of compressibility in the wavelet domain is controlled by the total-variation 
semi- norm j22]. We will use in particular that the rate of decay of the bivariate Haar coefficients 
of an image can be bounded by the total- variation (see Proposition |6] in Section |4]). 

Recall that the (univariate) Haar wavelet system constitutes a complete orthonormal system 
for square-integrable functions on the unit interval, consisting of the constant function 



H'{t) 



1 < t < 1, 
0, otherwise. 
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the mother wavelet 

rrlm _ / 1 < * < 1/2, 

^ W - I _1 1/2 < t < 1, 

and dyadic dilations and translates of the mother wavelet 

Hn,k{t) = '^'''^H^{'^''t-k)- nGN, 0<;fc<2". (7) 

The bivariate Haar system comprises an orthonormal system for L2{Q), the space of square- 
integrable functions on the unit square Q = [0, 1)^, and is derived from the univariate Haar system 
by the usual tensor-product construction. In particular, starting from the multivariate functions 

H%u,v) = H'^{u)H'^{v), e = (61,62) eV = {{0, 0}, {0, 1}, {1, 0}, {1, 1}}, 

the bivariate Haar system consists of the constant function and all functions 

X = {u, v), Hlkix) = 2^H%2^x -k), e e V, j>0, G n 2^Q (8) 

Discrete images are isometric to the space S^v C L2{Q) of piecewise-constant functions 



-'N 



|/eL2(Q), fiu,v) = Cj,k, ^<n<^, ^<^;<A| (9) 



via the identification cj^k = ^Xj,k- If N = 2'^, The restriction of the bivariate Haar basis to the 
Haar basis functions indexed by j < n — 1, arranged as an AT x A'" array such that the + 1)^ 
bases with index j < i form in the upper left ix i subarray and identified as discrete images Hji^y, 
generates the orthonormal bivariate Haar transform T-L : C^^^ — >■ with components 

[n{x)]j,k = {Hj,k,x) (10) 




(a) (b) 
Figure 1: (a) Original boats image and (b) its Haar coefficients. 

The bivariate Haar transform is orthonormal, and so standard CS results guarantee that images 
can be reconstructed up to a factor of their best approximation by s Haar basis functions using 
m > s log{N) measurements. One might then consider ^i-minimization of the Haar coefficients as 
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an alternative to total-variation minimization. However, total variation minimization (TV) gives 
better empirical image reconstruction results than £i-Haar wavelet coefficient minimization, despite 
not being fully justified by compressed sensing theory. For details, see [10[ [TH [23 j and references 
therein. For example. Figures [2] and [3] show the reconstruction of the Shepp-Logan phantom image 
and the cameraman image using 20% of their discrete Fourier coefficients (selected at random) . The 
image recovered via total variation minimization (TV) is not only more pleasing to the eye, but 
has a much lower recovery error than that of the image recovered via Haar minimization, e.g. {BLi) 
with orthonormal transform B = %. 

In the case of noise. Figure |4] displays the original Fabio image, corrupted with additive Gaussian 
noise. Again, we compare the performance of (TV) and [BLi) at reconstruction using 20% Fourier 
measurements. As is evident, TV-minimization outperforms Haar minimization in the presence of 
noise as well. Another type of measurement noise is a consequence of round-off or quantization 
error. This type of error may stem from the inability to take measurements with arbitrary precision, 
and differs from Gaussian noise since it depends on the signal itself. Figure [5] displays the lake 
image with quantization error along with the recovered images. As in the case of Gaussian noise, 
TV-minimization outperforms Haar minimization. All experiments here and throughout used the 
software £i-magic to solve the minimization programs 




(a) 



(b) 



Figure 2: (a) Original 256 x 256 Phantom image and its reconstruction from 20% randomly- 
selected Fourier coefficients using (b) total-variation minimization (TV) and (c) £i-minimization of 
its bivariate Haar coefficients. 



We note that the use of total variation regularization in image processing predates the theory 
of compressed sensing. The seminal paper of Rudin, Osher, and Fatemi introduced total-variation 
regularization in imaging [56] and subsequently total-variation has become a regularizer of choice 
for image denoising, deblurring, impainting, and segmentation [9l 140 1 l^^l 116^ [15j. For more details 
on the connections between total- variation minimization and wavelet frame-based methods in image 
analysis, we refer the reader to [6]. 



1.2 Contribution of this paper 

Although theoretical guarantees have been obtained guaranteeing recovery via {TV) of images with 
exactly sparse gradients without noise |lH ll4j. to the authors' best knowledge no results have shown 
that this recovery is robust, despite strong suggestions by numerical evidence. In this paper, we 
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(a) (b) (c) 

Figure 3: (a) Original 256 x 256 Cameraman image and its reconstruction from 20% randomly- 
selected Fourier coefficients using (b) total-variation minimization and (c) ^i-minimization of its 
bivariate Haar coefficients. 




(a) (b) (c) 



Figure 4: (a) Original 256 x 256 Fabio image corrupted with Gaussian noise and its reconstruction 
from 20% randomly-selected Fourier coefficients using (b) total-variation minimization and (c) ii- 
minimization of its bivariate Haar coefficients 



prove precisely this. We show that {TV) robustly recovers images from few RIP measurements. 
The error guarantees are analogous to those of ([s]) up to a logarithmic factor: 

Theorem A. Let X G c^x^ q^^ti image with discrete gradient V[X]. Suppose we observe noisy 
measurements y = M{X) + ^ constructed from a matrix satisfying the RIP of order s, with noise 
level 11^ II 2 < Then the solution 

X = argmin ||Z||rv' such that \\M.{Z) — y\\2 < £ (11) 
z 

satisfies 

\\X-X\\2<C log(iVV.) + .) . (12) 

This error bound is optimal up to the logarithmic factor as we discuss below. To our best 
knowledge, this is the first near-optimal result on stable image recovery by total variation mini- 
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(a) 



(b) 



Figure 5: (a) Original 256 x 256 lake image corrupted with quantization noise and its reconstruction 
from 20% randomly-selected Fourier coefficients using (b) total-variation minimization and (c) ii- 
minimization of its bivariate Haar coefficients. 

mization. For details about the construction of the measurements, see Theorem [3] below and the 
remarks following. 



1.3 Previous theory for total- variation minimization 

The last few years have witnessed numerous algorithmic advances that allow the efficient imple- 
mentation of total- variation minimization (TV) . The recent split Bregman algorithm proposed by 
[26], based on the Bregman distance [5], is very efficient. Several algorithms are designed to exploit 
the structure of Fourier measurements for further speed-up; see for example |50l |3]. Image recon- 
struction via independent minimization of the partial derivatives Xx and Xy was observed in |19j 
to give superior empirical results. 

With respect to theory, it was shown in jl4j that if an image X has an exactly sparse gradi- 
ent, then {TV) recovers the image exactly from a small number of partial Fourier measurements. 
Moreover, using that the discrete Fourier transform commutes with the discrete gradient operator, 
one may change coordinates in this case and re-cast [TV) as an (.i program (Li) with respect to 
the discrete gradient image [H] to derive stable gradient recovery results. In this paper, we extend 
these Fourier-specific stable gradient recovery results to general RIP measurement ensembles. 

However, robust recovery of the gradient need not imply robust recovery of the image itself. To 
see this, suppose the error V[JC] — V[X] in the recovery of the gradient has a single non-zero 
component, of size a, located at pixel (1,1). That is, the gradient is recovered perfectly except 
at one pixel location, namely the upper left corner. Then based on this alone, it is possible that 
every pixel in X differs from that in X by the amount a. This accumulation of error means that 
even when the reconstructed gradient is close to the gradient of X, the images X and X may be 
drastically different, magnified by a factor of A^^! Even for mean-zero images, the error may be 
magnified by a factor of A^, as for images X with pixels Xj^k = j- We show that due to properties 
of the null space of RIP matrices, the (TV) reconstruction error X — X in ( [A] ) cannot propagate 
as such. 

Recent work in |47| presents an analysis co-sparse model which considers signals sparse in the 
analysis domain. A series of theoretical and numerical tools are developed to solve the analysis 
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problem (BLi) in a general framework. In particular, the analysis operator may be the finite 
difference operator, which concatenates the vertical and horizontal derivatives into a single vector 
and is thus closely linked with the total variation operator. Effective pursuit methods are also 
proposed to solve such problems under the analysis co-sparse prior assumption. We refer the 
reader to [17] for details. 

Finally, we note that our robustness recovery results for (TV) are specific to two-dimensional 
images, as the embedding theorems we rely on do not hold in one-dimension. Thus, our results 
do not imply robust recovery for one-dimensional piecewise constant signals. Robustness for the 
recovery of the gradient support for piecewise constant signals was studied in [49j • 



1.4 Organization 

The paper is organized as follows. Section [2] contains the statement of our main result about robust 
total variation recovery. The proof of our main result will occupy most of the remainder of the 
paper. We first prove robust recovery of the image gradient in Section [3j In Section [4] we derive 
a strong Sobolev inequality for discrete images lying in the null space of an RIP matrix which 
will bound the image recovery error by its total-variation. Our result relies on a result by Cohen, 
DeVore, Petrushev, and Xu that the compressibility of the bivariate Haar wavelet transform is 
controlled by the total- variation of an image. We prove the main theorem. Theorem by way 
of Theorem |3] in Section 14.11 We conclude in Section [5] with some brief discussion. Proofs of 
intermediate propositions are included in the appendix. 



2 Main results 

Our main result shows robust recovery of images via the total variation minimization program 
{TV). For simplicity of presentation, we say that a linear operator A : C^^^^^ — )> C™ has the 
restricted isometry property (RIP) of order s and level 6 £ (0, 1) if 

(1 - 5)\\X\\l < \\A{X)\\l < (1 + 5)\\X\\l for ah s-sparse X G C^ix^2. (13) 

Here and throughout, \\X\\p = [Yljk \-^j,krj denotes the entrywise ^p-norm of the image X, 



treating the image as a vector. In particular, p = 2 is the Frobenius norm 



V j,k 

This norm is generated by the image inner product 

{X,Y) = tvace{XY*). (14) 
Note that if the linear operator A is given by 

{A{X))^ = {Aj,X), 

then A satisfies this RIP precisely when the matrix whose rows consist of Aj unraveled into vectors 
satisfies the standard RIP as defined in ([T]). There is thus clearly a one-to-one correspondence 
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between RIP for linear operators A : C^^^^^ — )• and RIP for matrices $ G £^mx(NiN2) ^ g^^^ 
treat these notions as equivalent. 

Since we are considering images X G C^^^ rather than vectors, it will be helpful to first 
determine what form an optimal error recovery bound takes in the setting of images. In standard 
compressed sensing, the optimal minimax error rate from m > slog(A^^/,s) nonadaptive linear 
measurements is 

\\x-xh<c(^^^^^^ + ey (15) 

In the setting of images, this implies that the best possible error rate from m > slog(A^^/s) linear 
measurements is at best: 

l|X-X|b<cfll^W-'7™-+.V (16) 



Above, (V[X])s is the best s-sparse approximation to the discrete gradient V[X]. To see that 
we could not possibly hope for a better error rate, observe that if we could, we would reach a 
contradiction in light of the norm of the discrete gradient operator: || V[.Z]||2 < 4||.Z||2. 



Our main result guarantees a recovery error proportional to (16) up to a single logarithmic 
factor (log(iVV'S))- That is, the recovery error of Theorem [s] is optimal up to at most a logarithmic 
factor. Here and throughout we use the notation u> v to indicate that there exists some absolute 
constant C > such that u > Cv. We use the notation u < v accordingly. In this article, C > 
will always denote a universal constant that might be different in each occurrence. 

To change coordinates from the image domain to the gradient domain, it will be useful for us 
to consider matrices $o and obtained from a matrix $ by concatenating a row of zeros to 
the bottom and top of respectively. Concretely, for a matrix $ E £^{N-i)xN ^ denote by 
^0 g £^NxN augmented matrix with entries 

We denote similarly by $o the matrix resulting by adding an additional row of zeros to the bottom 
of *. 



We can relate measurements using the padded matrices (17) of the entire image to measurements 
of its directional gradients, as defined in ([4]). This relation can be verified by direct algebraic 
manipulation and so the proof is omitted. 

Lemma 2. Given X G C^^^ and * G c{N-i)xN ^ 

($,X,.) = (*o,X>-(*o,X) 

and 

(*,X,) = (*o,X^>-(*o,X^>, 
where X^ denotes the transpose of the matrx X . 

For a linear operator A : ([^i^-'^)^^ — ). with component measurements A{X)j = {Aj,X) 
we denote by A'^ : C^^^ — t- C™ the linear operator with components [^'^(X)]j = (^[A^]j, X^ We 
define A : C^^^ ^ similarly. 

We are now prepared to state our main result which guarantees stable image recovery by total 
variation minimization using RIP measurements. 
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Theorem 3. Let N = 2"". Let A : C^^ i)xAf _^ (j^mi /^^^g ^/^g restricted isometry property of order 
5s and level 6 < 1/3. Recall the bivariate Haar transform % : C^^^ — t- C^^^ as defined in (10) 



and let B : C^^^ C'"^ be such that the composite operator B o 'H^^ : c^^^ — >• C™^ /las i/ie 
restricted isometry property of order 2s and level 6 < 1. 

Let m = Ami + ''tT'2, o,nd consider the linear operator Ai{X) : <C^^^ — )• C" with components 

M{X) = (a''{X),Ao{X),A\X^),Ao{X^),B{X)). (18) 



If X ^ (j^iVxiV j^^g discrete gradient V[X] and noisy measurements y = A4{X) +^ are observed 
with noise level ||^||2 < then 



satisfies 



X = 8iVgmm\\Z\\Tv such that \\M{Z) — y\\2 < e (19) 
z 



||V[X-X]||,<^a^^™k+,, (20) 
\\X - X\\tv < ||V[X] - V[X],||i + V^e, (21) 
|X-X|b<log(^V^)f^^^,^™+.y (22) 



and 



To our best knowledge, Theorem [3] provides the first provable guarantee of robust recovery for 
images from compressed measurements via total variation minimization. Since we only require the 
RIP on the measurements generating the operator Ai, Theorem |3] implies Theorem [X} 

Remarks. 

1. In light of (16), the gradient error guarantees (20) and ( |21[ ) provided by the theorem 



are optimal, and the image error guarantee (22) is optimal up to a logarithmic factor, which we 



conjecture to be an artifact of the proof. We also believe that the 4mi measurements derived from 
A in the theorem, which are only used to prove stable gradient recovery, are not necessary and can 
be removed. 

2. The RIP requirements in Theorem [3] mean that the linear measurements can be generated 
from standard RIP matrix ensembles. For example, they can be generated from a subgaussian 
random matrix $ G W^^^ with m > slog(A^^/s) or a partial random Fourier matrix Fq G 
£^mxN -^[ii^ > s\o^{N) with randomized column signs |30| . The latter measurements admit 
0{N'^ log A^) matrix-vector multiplication routines and so the implementation of (TV) can be sped 
up considerably for such structured measurements. With access to such measurement matrices, 
the measurements M.{X) in Theorem |3] can be constructed by applying such RIP matrices to X 
and X^ . We emphasize again that the particular structure of the measurements (18) is likely not 
necessary and only a requirement of the proof. 

3. We have not tried to optimize the dependence on the values of the restricted isometry 
parameters in Theorem |3j Refinements such as those in standard compressed sensing may yield 
improvements in the conditions. 

4. Theorem [3] requires the image side-length to be a power of 2, = 2". This is not actually a 
restriction, as an image of arbitrary side-length G N can be reflected horizontally and vertically 
to produce an at most 2A^ x 2A^ image with the same total-variation up to a factor of 4. 
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The remainder of the article is dedicated to the proof of Theorem [3j which we break into two 
parts. The proof of Theorem [s] is two-part: we first prove the bounds (20) and (21) concerning 
stable recovery of the discrete gradient. We then prove a strengthened Sobolev inequality for images 
in the null space of an RIP matrix, and stable image recovery follows. 



3 Stable gradient recovery for discrete images 



In this section we prove statements (20) and (21) from Theorem |3j showing that total- variation 
minimization recovers the gradient image robustly. We will appeal to the following proposition 
which is used implicitly in many results in compressed sensing concerning the recovery of sparse 
signals using li minimization. It allows us to bound the norm of an entire signal when the signal 
(a) is close to the null space of an RIP matrix and (b) obeys an ii cone constraint. In particular, 



(|24) is just a re-statement of results in 
23 



while (25) follows from (24) and the cone-constraint 



The proof of Proposition [4] is contained in the appendix for completeness 



Proposition 4. Suppose that A : C^^^ C*" satisfies the restricted isometry property of order 
5s and level 5 < 1/3, and suppose that the image D G c^x^ satisfies a tube constraint 



\\AD)h 



< 



Suppose further that for a subset S C {1, 2, . . . , A^}^ of cardinality \S\ = s, D satisfies a cone- 
constraint 

\\Ds4i<\\Ds\\i + C (23) 



Then 



and 



\D\\i<^ + ^e. 



(24) 



(25) 



We note that neither the RIP level of 5s or the restricted isometry constant 6 < 1/3 are sharp; 
for instance, an RIP level of 2s and restricted isometry constant 62s ~ .4931 are sufficient for 
Proposition |4] [36, 7j. 

We now show that the gradient of the image is recovered stably by TV-minimization. 



3.1 Proof of stable gradient recovery, (20) and (21) 

Since A : C^^^ — )• satisfies the RIP, in light of Proposition [ij if suffices to show that the 
discrete gradient V[X — X], regarded as a vector, satisfies the tube and cone constraints. 



Let D = X - X, and let Ai, A2, 



Arm be such that 



AiZ)j = {Aj,Z). 



Cone Constraint. The cone constraint holds by minimality of X = X — D. Indeed, by this and 
the fact that X is also a feasible solution, letting S denote the support of the largest s entries 
of V [X] , we have 



12 



||V[X]5||i - ||V[£>]5||i - \\V[X]s4i + l|V[£>]s^||i 

< \\V[X]s + V[D]s\\i + \\V[X]sc + V[D]s4i 
= l|V[X]||i 

< l|V[X]||i 

= ||V[X]5||i + ||V[X]sc||i 

Rearranging, this yields 

\mD]s4i < II V[£>]s||i + 2|| V[X] - V[X],||i. 
Tube constraint. First note that D satisfies a tube constraint, 



< 6e^ 



\\M{D)\\l < 3\\M{X)-y\\l + 3\\M{X)-y\\l 

Now by Lemma [2j 



< 2\{[Ajf,D)\' + 2\{[Aj]o,D)\' (26) 



|(A„£>,)|2 = \{[Ajf,D)-{[Aj]o,D 

L/,D>|2 + 2|([A, 

and 

\{A„Dy)\' = \{[Aj]',D^)-{[Aj]o,D^)\' 



< 2\{[Aj]',D^)f + 2\{[Aj],,D^)\' (27) 



Thus V[Z)] also satisfies a tube-constraint: 

m 

ummwl = j]i(A„i?.)|2 + i(A,-,D,)|2 

< nM{D)\\i 

< I2e^. (28) 

Proposition |4] then completes the proof. 

4 Proof of Theorem [3] by a strengthened null-space Sobolev in- 
equality 

As a corollary of the classical Sobolev embedding of the space of functions of bounded variation 
BV{^^) into L2(M^) [I], the Frobenius norm of a mean-zero image is bounded by its total- variation 
semi-norm. We recall the proof of this corollary in the appendix. 
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Proposition 5 (Sobolev inequality for images). Let X G 1^2" ^ mean-zero image. Then 

IIXII2 < ^||X||ry (29) 



In light of the total- variation error estimate (21), the Sobolev inequality allows a preliminary 



estimate for the image error, assuming it is mean-zero: 

\\X - X\\2 < II V[X] - V[X]5||i + Vse. 

We will be able to derive a sharper bound on the error by looking to a deep and nontrivial theorem 
from |19j which says that the bivariate Haar coefficient vector of a function / € BV{Q) on the 
unit square Q = [0, 1)^ is in weak £1, and its weak £i-norm is proportional to its bounded- variation 
semi- norm. As a corollary of that result, we can bound the magnitude of the A;th- largest bivariate 
Haar coefficient of an image by the total variation of the image: 

Proposition 6. Suppose X G C^^^ , and let C(fc)(JC) be the bivariate Haar coefficient of X 
having kth largest magnitude, or the entry of the bivariate Haar transform 1-L{X) having kth largest 
magnitude. Then for all k>\, 

|c(fc)(X)|<cHlK 

The derivation of Proposition [6] from Theorem 8.1 of [19] is provided in the appendix. 

Proposition |6] bounds the decay of the Haar wavelet coefficients by the image total- variation 
semi- norm. At the same time, vectors lying in the null space of a matrix with the restricted isometry 
property must be sufficiently flat, with the ^2-energy in their largest s components in magnitude 
bounded by the ii norm of the remaining components (the so-called null-space property) [18]. As 
a result, the norm of the bivariate Haar transform of the error, and thus the norm of the error 
itself, must be sufficiently small. Specifically, the error X — X satisfies a Sobolev inequality that is 



stronger than the standard inequality (29) by a factor of \og{N / s) / y/s . 



Theorem 7 (Strong Sobolev inequality). Let B : C^^^ — )■ C™" be a linear map such that 
B o : <C^^^ — )• C™" has the restricted isometry property of order 2s and level 6 < 1, where 
H : C^^^ — )• C^^^ is the bivariate Haar transform. Suppose that D G ([^NxN g^Ugji^g ifi^ tube 
constraint \\B{D)\\2 < £■ Then 

\\Dh<{}^^)\og{N^/s) + e. (30) 

Proof. Let Y = 'H{D) G (j^A^xAf bivariate Haar transform of D, and let cq) G C be the 

jth-largest entry (pixel) of Y in absolute magnitude. 

Decompose Y = I5 -|- Ysc where Ys is the s-sparse image consisting of the s largest-magnitude 
entries of Y. Write Ygc = Y^^^ + Y^"^^ -|- • • • -|- Y^^'^ where r = [^J, where 1^*^^^ is the s-sparse 
image consisting of the s largest-magnitude entries of Ys'c, and so on. 



14 



By Proposition |6| we know that |c(j)| < C||Z)||Ty/j- Then 

\\Ys4i -- 



j=s+l 



Af2 



< c\\d\\tv y - 

j=s+l ■> 

< C\\D\\Tv'^og{N^/s) 



(31) 



where the second inequahty follows from properties of the geometric summation. We can similarly 
bound the ^2-iiorm of the residual image: 



W l|2 



j=s+l 



< Ci\\D\\Tv? E 7. 



j=s+l 



< C{\\D\\Tvf/s, 



(32) 



obtaining ||ls"=||2 < C\\D\\tv/V^- 

We now use the assumed tube constraint for D and restricted isometry property for B o 

e > mD)h 

r 

> \\B o n-HYs + l^('))||2 - lie o 7^-i(i-(i))||2 

i=2 



> (1 - 6)\\Ys + 1^W||2 - (1 + 5) E ||r(^')||2 



i=2 



> (1- 5)11^5112 -(1 + 5)^ 



i=i 



> (1 



1 



{l + 6)^\\Ys4i 



Combined with the £i-norm bound (31) on Ysc this gives 



l^5||2 < e + log(iVVs) 



li:>l 



TV 



This bound together with the ^2-tail bound (32) gives 



|£>||2 = ||1^||2 < III5II2 + 111^5^12 < e + log(iVVs) 



I-DI 



TV 



(33) 



(34) 



(35) 



where the first equality follows by orthonormality of the Haar transform. This completes the 
proof. □ 
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4.1 Proof of Theorem |3] 



We now have all the ingredients to prove our main result Theorem [3} 



Proof. Since bounds (20) and (21) were already proven in Section |3.1[ it remains to prove the 
final stability bound (22). Since the measurements of X are of the form (18), the image error 
D = X — X satisfies the tube-constraint ||i3i?||2 < £• Thus we may apply Theorem [t] and then 
the total- variation bound (21) on D which shows 



Il^-^ll2 

< e + log(iVVs 

< e + log(iVVs 

< log(iVVs) 
This completes the proof of Theorem [3j 



\D\ 



TV 



\v[x]-v[xUi + ^£ 



V\X] - V\X' 



s\\l 



+ e 



□ 



5 Conclusion 

Compressed sensing techniques provide reconstruction of compressible signals from few linear mea- 
surements. A fundamental application is image compression and reconstruction. Since images are 
compressible with respect to wavelet bases, standard CS methods such as ^i-minimization guarantee 
reconstruction to within a factor of the error of best s-term wavelet approximation. The story does 
not end here, though. Images are more compressible with respect to their discrete gradient repre- 
sentation, and indeed the advantages of total-variation (TV) minimization over wavelet-coefficient 
minimization have been empirically well documented (see e.g. [lOi illj). It had been well-known 
that without measurement noise, images with perfectly sparse gradients are recovered exactly via 
TV-minimization [14J. Of course in practice, images do not have exactly sparse gradients, and 
measurements are corrupted with additive or quantization noise. To our best knowledge, our main 
result, Theorem |3] is the first to provably guarantee robust image recovery via TV-minimization. 
In analog to the standard compressed sensing results, the number of measurements required for 
reconstruction is optimal, up to a single logarithmic factor in the image dimension. 
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A Proofs of Lemmas and Propositions 
A.l Proof of Proposition [4] 

Here we include a proof of Proposition El which is a re-statement of results in [JLlJ. 
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Proof. Suppose that D G ([^^^^ obeys the cone constraint 

\\Ds4i < \\Ds\\i + C (36) 

and the tube constraint ||^(Z))||2 < £■ 

We write D'^ = Ds^ = U^^^ + D^^^ H h -D*^'') where r = [^J. Here D^^'^ is the 4s-sparse 

image consisting of the 4s largest components in magnitude of D'^, D^"^^ consists of the next 4s 
largest components in magnitude of D^, and so on. Since the magnitude of each component of 
^O'-i) ig larger than the average magnitude of the components of D^^\ 

\\D^'^\\2< " , J = 2,3,... 

Combining this with the cone constraint gives 

E S ^IIDl, < ^J\Dsh + ^« < + ^? (37) 

Now combined with the tube constraint and the RIP, 



e > UD\\2 

r 

> \\A{Ds + D^'^)\\2-J2\\AiD(n\2 



i=2 



> Vl^\\Ds + D^^^\\2-VT+SY,\\D^^M\2 



i=2 



> VT^\\Ds + D^^^\\2-Vl + 6(^^\\Dsh + ^^ 



Then since 5 < 1/3, 



\Ds + D^^^\2 <3e + 



2e 



Finally, because || J2'j=2 I^^^^h < E}=2 ll^^-^'^lb < U^s + -0^2 + 273?' ^e have 



3^ , e 



D 2 < 5e + ^ + 



confirming (24). 



To confirm ( 25 ) , note that the cone constraint allows the estimate 



|£>||i < 2\\Ds\\i + ^ 

< 2^\\Dsh + ^ 

< 2V^(3. + |)+^ 



(38) 



(39) 

□ 
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A. 2 Proof of Proposition [5] 

Here we recall the proof of the Sobolev inequality for images. Without loss, we assume that 
X E c^x^ is mean-zero in the sequel. 

Proof. For any 1 < k < i < N we have, 

l=k 
i-l 



l=k 
N-l 

< + ^ (40) 

As the same result holds by backward- iteration if /c > i, averaging these bounds yields 

^ N N~l 
fc=l £=1 

Similarly, by reversing the order of indices we also have 

^ N N-l 

k=l 1=1 

For ease of notation let 

N N-l 



N 

k=l k=l 

and let 

TV 7V-1 



^^l^J AT 

k=l k=l 

Combining the two bounds on Xij results in the bound < /(j) • g{i). 

It follows 

N N N N 

j=l j=l j=l i=l 



\j=i i=i I 

^ / Af N-l N N-l 

- 4 ■ \^k+i,j - Xkj\ + |Xi,fc+i - Xi^k 

\j=l k=l i=l k=l 

< \mx]\\i 

= \\\XfTV (42) 
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Note the second-to-last inequality holds because X is mean-zero. □ 
A. 3 Derivation of Proposition [6] 

Recall that a function /(u, v) is in the space Lp{il) (1 < p < oo) if 



L,{n) ■■= ( / \f{xWdx) ^ < oo, 



and the space of functions with bounded variation on the unit square is defined as follows. 

Definition 8. BV{Q) is the space of functions of bounded variation on the unit square Q := 
[0, 1)^ C M^. For a vector v G M?, we define the difference operator A^, in the direction of v by 

^v{f,x) := f{x + v)- fix). 

We say that a function f S Li{Q) is in BV{Q) if and only if 

2 2 
Vqif) ■.= SUph''^^\\Ahe,{f,-)\\L^{Q{he,)) = l^^h'^ ^ W^he.if , ■)\\L^iQ(he,)) 

is finite, where ej denotes the jth coordinate vector. Here, the last equality follows from the fact 
that \\Afiejif, ■)\\li{Q) subadditive. Vq(/) provides a semi-norm for BV: 

l/lw(Q) ■■=VQ{f). 

Theorem 8.1 of [19] bounds the rate of decay of a function's bivariate Haar coefficients by its 
bounded variation semi-norm. 

Theorem 9 (Theorem 8.1 of [19j). Consider a function f G BV{Q) and its bivariate Haar coeffi- 
cients arranged in decreasing order according to their absolute value, Cf^f.-jif). We have 

C(fc)(/) < <-i— ^ 

where Ci = 36(480\/5 + 168\/3). 

As discrete images are isometric to piecewise-constant functions of the form ([9]), the bivariate 
Haar coefficients of the image X G d^^x^ ^re equal to those of the function fx £ L2{Q) given by 

fx{u,v) = NXi^^, i_l<n<^, l^<v<j^, l<i,j<N. (43) 

To derive Proposition [6j it will suffice to verify that the bounded variation of fx can be bounded 
by the total variation of X. 

Lemma 10. |/x|_BV ^ II^IItv 
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Proof. For h < j^, 

^hei {fx,{u,v)) 



and 
Then 

\fx\BV 



0, else. 



AT 



^he2{fx, {U,V)) 



0, else. 



lim — 
h^o h 



/ / \fx{u + h,v) - fx{u,v)\ dudv + / \fx{u,v + h) - fx{u,v)\ dvdu 

JO JO JO JO 



N-1 N-1 



N-1 N-1 

_;=1 1=1 1=1 j=l 

< ll-X'llj'y 



(44) 

□ 
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